-
Notifications
You must be signed in to change notification settings - Fork 141
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Draft] | GPUAI-3720 - Integrate Universal GEMM into Grouped GEMM - Pt 1 #1800
base: develop
Are you sure you want to change the base?
Conversation
469a083
to
15a21fc
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good work. But still a bunch of things to do. Keep in mind that CDataType
is equivalent of EDataType
there's no need to duplicate just pickup one - rather EDataType
.
Additionally don't forget to check all neccessary conditions for all gemms and infer from them value that holds for all gemms, like all_have_kbatch_gt_one
or all_have_main_kblock_loop
this must be checked for all gemms.
Additionally you'd have to calculate tail_number and verify if it's same for all gemms.
include/ck/tensor_operation/gpu/device/impl/device_grouped_gemm_xdl_splitk_cshuffle.hpp
Outdated
Show resolved
Hide resolved
include/ck/tensor_operation/gpu/device/impl/device_grouped_gemm_xdl_splitk_cshuffle.hpp
Outdated
Show resolved
Hide resolved
include/ck/tensor_operation/gpu/device/impl/device_grouped_gemm_xdl_splitk_cshuffle.hpp
Outdated
Show resolved
Hide resolved
include/ck/tensor_operation/gpu/device/impl/device_grouped_gemm_xdl_splitk_cshuffle.hpp
Outdated
Show resolved
Hide resolved
include/ck/tensor_operation/gpu/device/impl/device_grouped_gemm_xdl_splitk_cshuffle.hpp
Outdated
Show resolved
Hide resolved
include/ck/tensor_operation/gpu/device/impl/device_grouped_gemm_xdl_splitk_cshuffle.hpp
Outdated
Show resolved
Hide resolved
include/ck/tensor_operation/gpu/device/impl/device_grouped_gemm_xdl_splitk_cshuffle.hpp
Outdated
Show resolved
Hide resolved
include/ck/tensor_operation/gpu/device/impl/device_grouped_gemm_xdl_splitk_cshuffle.hpp
Outdated
Show resolved
Hide resolved
4eea3bd
to
a6f99d8
Compare
Proposed changes
This PR integrates Universal GEMM into Device Grouped Gemm.
Specifically, we replace:
The GridwiseGemm_bk0mk1_bk0nk1_mn_xdlops_v2r4r2 in device_grouped_gemm_xdl_splitk_cshuffle.hpp with GridwiseGemm_xdl_cshuffle_v3
We make corresponding changes to the struct Argument and struct Invoke
Checklist
Please put an
x
into the boxes that apply. You can also fill these out after creating the PR. If you're not sure, please don't hesitate to ask.clang-format
on all changed filesDiscussion
If this is a relatively large or complex change, feel free to start a discussion by explaining why you chose the solution you did and what alternatives you considered